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ABSTRACT 



The Bootstrap method is a nonparametric statistical technique for estimating the sam- 
pling distribution of estimators of unknown parameters. While the asymptotic theory for 
bootstrap is well established, this thesis investigates the behavior of the bootstrap for 
small sample sizes. For the exponential distribution and for normal linear regression the 
bootstrap estimates of *he parameters and their variances are compared with the the- 
oretical sampling distributions. The small sample properties of bootstrap confidence in- 
tervals using the percentile method and the bias-corrected percentile method are also 
investigated. 
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THESIS DISCLAIMER 

The reader is cautioned that computer programs developed in this research may not 
have been exercised for all cases of interest. While every efTort has been made, within 
the time available, to ensure that the programs are free of computational and logic er- 
rors, they cannot be considered validated. Any application of these programs without 
additional verification is at the risk of the user. 
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I. INTRODUCTION 



The Bootstrap method, a statistical technique for estimating the sampling distrib- 
utions of estimators of unknown parameters, was introduced by Efron [Ref. 1] in the 
mid 1970s. This computer intensive method is nonparametric in nature and relies on re- 
peated resampling (bootstrapping) from the observed values of a random sample. 

Suppose jc,, x 2 , x 3 , ..., x n are the observed values of a random sample of size n, A\ , 

A 

X 2 , A' 3 , ..., X„ , from a distribution f x {x',6). Let 0 — h(X u X 2 ,X 2 , ...,X n ) be an estimator 

A 

for the unknown parameter 0 . The sampling distribution of 6 completely describes the 
properties of the estimator and its knowledge would be useful for investigative purposes. 
However in many situations the analytical derivation of this distribution may be quite 
demanding. An alternative approach is to estimate the sampling distribution using 
bootstrap methods. A set of N bootstrap samples of size n, x* fl , x* n , x* fl , ..., x* jn for 
j = 1, 2. 3, ..., X is generated by repeated uniform sampling with replacement from the 
set { x, , x 2 , .v 3 , ..., x n } . The estimate = h ( x* yl , x* j2 , x* j3 , ..., x * Jn ) is computed 

A 

for each of the X bootstrap samples. The empirical distribution of the 0*j for j = 1,2, 
3, ..., X is taken as the estimate of the sampling distribution of 6 . 

Efron [Ref. 1] showed, that the bootstrap estimator is consistent and Beran et al. 
[Refs. 2, 3] proved that under fairly general regularity conditions the bootstrap distrib- 
ution converges to the true sampling distribution as n -> oo and N -> oo . It has also 
been demonstrated that bootstrap methods perforin better than some of the other re- 
sampling techniques such as Hartigan's subsample method [Ref. 4] and the Tukev- 
Quenouille Jackknife [Ref. 1]. 

Although the asymptotic behavior of the bootstrap has been well established by 
theoretical research, there are still some problems dealing with the small sample prop- 
erties of the methods, which are open for further investigation. One of these problems 
is the question of how the original sample size n and the number of bootstrap repli- 
cations X affect the "closeness" of the bootstrap distribution to the exact sampling dis- 
tribution. Another one deals with the applicability of bootstrap-based percentiles as a 
basis for estimating confidence intervals for parameters. Information about these issues 
will be useful to a practitioner in the decision of how to employ his resources. 

The aim of this thesis is to address the two problems stated above. The approach 
which is taken is to consider probability distributions and their parameters, for which the 
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exact sampling distributions of the estimators can be derived theoretically. The results 
of simulations of the bootstrap method will be compared with the theoretical results in 
order to analyze the impact of the sample size n and the number of bootstrap repli- 
cations N in the context of relatively small samples. 

Chapter II provides an overview of some bootstrap methods and their properties. 
In Chapter III the bootstrap method is applied to the maximum likelihood estimator 
of the scale parameter of the exponential distribution. The estimation of the parameters 
in normal linear regression is studied in Chapter IV. In Chapter V the conclusions are 
presented. 
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II. BOOTSTRAP METHODS - AN OVERVIEW 



A. THE BASIC BOOTSTRAP METHOD 

The bootstrap method is a resampling technique for estimating the sampling dis- 
tribution of an unknown parameter of a probability distribution. Let X = {A,, X 2 , A\ , 
.... A„ } be a sample of size n from a distribution with probability density function 
f x {\',6) and distribution function F x (\:6). Let 6 = h(X) be an estimator for the parame- 
ter 6 . The distribution of 6 , g(d\ 6) is called the sampling distribution of 0 . In many 
problems it may be quite difficult to derive the sampling distribution analytically. But 
since computer resources are nowadays inexpensive and easily available, methods like 
bootstrap [Ref. 1], which will be described below, can be used to estimate the distrib- 
ution of 6. 

Suppose x = {x„ x 2 , x 3 , ..., x„ } are the observed values of the random sample. A 
bootstrap sample x* = {x*,. x* 2 , x* 3 , ..., x ff „ } ('*' indicates bootstrapping) of size n is 
obtained by randomly drawing with replacement from the original sample x . Another 
way of describing this resampling procedure is: The empirical distribution function F, 
which is discrete, is constructed by assigning a probability mass 1 n to each of the ori- 
ginal samples x, and then drawing n random samples from F. Although it is possible to 
imagine, as Bickel and Freedman [Ref. 3] mention, bootstrap samples of an arbitrary 
size m, mathematical theory [Ref. 3] indicates that the use of the same size n as in the 
original sample is preferable. 

Before continuing the description of the bootstrap method it seems appropriate to 
summarize some properties of any bootstrap sample. Each element in a bootstrap 
sample is drawn independently from the original sample. So conditional on the original 
sample the probability that the jth element in a bootstrap sample is any one of the ori- 
ginal sample values is the same: 

P{X*j = Xi \x} = -J- fori,j = 1,2, 3, ...,«. (2.1) 

The expectation, conditional on X, of A*, is 

E[X*j | x] = x for} = 1,2, 3, ...,«, (2.2) 
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where „v is the sample mean Y. X J n - Then for example the mean of the bootstrap sample 
^.v*,.'n has the conditional expectation 



£[A'* | X] = X (2.3) 

and the unconditional expectation 

£[I*] = £[£[I*|X]] = n- (2.4) 

The variance of the mean of the bootstrap sample is 



n 

VariX* lx] = (2-5) 

n fci 

which for n -*■ oo converges to Kar[A']. 

The process of obtaining one bootstrap sample set and computing the estimator for 
this sample is called a bootstrap replication. For the bootstrap method X bootstrap 
replications are performed, where X varies throughout the literature between 100 and 
2000. This means that X bootstrap samples x* y for j = 1,2, 3, ... , X are obtained and 

A 

for each sample the estimator 6 *, = h(x*j) is computed. The bootstrap distribution, the 

A A 

empirical distribution of the 0*., is then an estimate of the sampling distribution of 6. 
The bootstrap estimate for 0 is defined by 



A 





(2.6) 



and 



S* = 




(2.7) 



is the bootstrap estimate of ctj , the standard deviation of 0. 

Efron [Ref. 1] and Bickel and Freedman [Ref. 3) have shown, that under fairly 
general regularity conditions, as n -> oo the bootstrap estimate and its standard devi- 
ation converge to their actual values. 
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B. VARIATIONS OF THE BOOTSTRAP METHOD 

This section briefly describes some variations of the bootstrap method to demon- 
strate the variety of options available to the practitioner. These methods however will 
not be the subject of investigations in this thesis. 

1. Parametric Variations of Bootstrap 

To improve the bootstrap method in those cases, where additional information 
about the underlying distribution is available, Efron proposed [Ref. 4] the Smoothed 
Bootstrap. The major difference from the basic bootstrap is, that the bootstrap samples 
are now obtained by sampling from a continuous empirical distribution F. This distrib- 
ution F is constructed by interpolating between the steps of the discrete empirical dis- 
tribution F using an appropriate smoothing function. Efron points out that the choice 
of the function is not arbitrary. In order to gain improvement of the results, compared 
to the basic bootstrap, the selection of the function type has to be compatible with the 
distribution under investigation. So this variation of the method is no longer 
nonparametric in an absolute sense. 

If the exact distribution of the A' is known except for the values of the parame- 
ters, this distribution can be used to perform the smoothing; Efron [Ref. 4] calls this 
method the Parametric Bootstrap. 

2. The Balanced Bootstrap 

Davison, Hinkle} and Schechtman [Ref. 5] introduced the Balanced Bootstrap 
to eliminate the linear component of the bias of bootstrap estimators. Their method 
obtains the N bootstrap samples by first catenating the vector of n original samples X 
times, randomly permutating the resulting vector and then taking X successive vectors 
of size n, ensuring that each jr, occurs exactly X times in the total X bootstrap samples. 
It is easily seen, that when an estimator h(X) for 6 is linear and symmetric in X, then 




( 2 . 8 ) 
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C. CONFIDENCE INTERVALS 

One of the applications of the sampling distribution is to approximate confidence 
intervals for a parameter. The following sections discuss two bootstrap-based methods 
for this purpose. 



1. The Percentile Method 

The percentile method is appealingly straightforward and provides, Efron 
[Ref. 4], good results. It is based on the definition of the empirical cumulative distrib- 
ution function G* of the estimator, 



G*{x) = P{b<x) 



X 



A 

The pth percentile then can be approximated by Q' f p defined by 



(2.9) 



he* ^ b p ) < P . (2.io) 

A A 

Efron [Ref. 4] proposes the use of (0*„ $*,_«) as an approximate 100(1 - 2a )% confi- 
dence interval for 0 . 



2. The Bias-Corrected Percentile Method 

The bias-corrected percentile method covers those cases, where the empirical 
bootstrap distribution is not median-unbiased, i. e., 

P{b < 0} * 0.5. (2.11) 

The percentile method may produce inaccurate percentile estimates in this case. To 
compensate for these inaccuracies, Efron [Ref. 4] introduces the Bias-Corrected 
Percentile Method. This method relies, as Schenker [Ref. 6] points out, on an assump- 
tion, which in general is at best approximately valid. The assumption is, that there exists 
a function g such that 

g{0)-gtf) ~ A'(^,t 2 ) (2.12) 

and 

g(b)-g(d) ~ /Y(i/,t 2 ) (2.13) 
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with t; and r being real variables but constant for a specific case. Let 

z 0 = <D-‘[(7*(0)] (2-14) 

and 

z, = (P-'(l-a) (2.15) 

where O denotes the cumulative distribution function of the standard normal distrib- 

A 

ution and 6 is the value of the estimator for the original sample. Then the approximate 
1 - 2a confidence interval is given by 

(G* -1 [O(2z 0 - z 3 )], G*-'[0(2z o + z a )]) (2.16) 

It is easily seen, that for median unbiased sampling distributions, i. e., if 
P{0*<0} = 0.5, (2.17) 

z 0 = 0 and the bias-corrected percentile method is identical with the percentile method. 
Schenker's intention [Ref. 6] is to demonstrate some deficiencies of bootstrap-based 
confidence intervals for small sample sizes. Nevertheless, he does provide results which 
seem to indicate, that the bias-corrected percentile method is an improvement over the 
percentile method. 

For the cases, where the underlying assumptions for the bias-corrected 
percentile method do not hold, Efron and Tibshirani [Ref. 7] proposed another method 
called the DC, method. This thesis is concerned with the first two methods only. 
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III. THE EXPONENTIAL DISTRIBUTION 



In this chapter, the performance of the Bootstrap method is compared to the the- 
oretical results in the case where the underlying distribution is the exponential distrib- 
ution. 



A. THEORETICAL RESULTS 

Let A', , X 2 , A, , ... , A’„ be i. i. d. random samples from the exponential distribution 
Exp [ X ] with probability density function 



fxi x ) = 



Xe forx>0 

0 otherwise 



(3.18) 



The Maximum Likelihood Estimator (MLE) for the scale parameter X is 



(3.19) 




/=i 



Using the fact, that the sum of n i.i.d. exponential random variables is distributed as 
Gamma [ / ,n ] , the probability density function of the random variable W, defined by 



W = 




(3.20) 



can be shown to be 



fni w ) 



r(«) ( w ) e 



-n>. 



for w > 0 
otherwise . 



(3.21) 



This is the exact sampling distribution of the maximum likelihood estimator for ). . 
Figure 1 shows the graph of the sampling distribution for / = 1 and sample sizes 
n = 10, 30 and 50. 

Computations of the moments yield 



8 




Figure i. Probability Density Function of the Sampling Distribution: ). = 1, 

sample size n = 10, 30, 50. 



Em = 



n 

n — 1 






(3.22) 



which shows that the MLE is asymptotically unbiased, and 
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Var[ IF] = 



(3.23) 



(»-l) 2 (»-2) 

For this distribution exact probabilities can be computed using the following identity, 

P{W<w) = 1 - /„( ~~~ ), (3.24) 

where /„ denotes the Incomplete Gamma function. Table 1 shows the true values for 
some percentiles of the distribution of W for X = 1. 



Table 1. PERCENTILES OF THE SAMPLING 



DISTRIBUTION: X = 1, sample size n. 



11 


5 % 


10 % 


90 % 


95 % 


10 


0.6367 


0.7039 


1.6074 


1.8432 


20 


0.7174 


0.7721 


1.3769 


1.5089 


30 


0.75S7 


0.8065 


1.2915 


1.3893 


40 


0.7S52 


0.8283 


1.2446 


1.3247 


50 


0.S042 


0.8439 


1.2142 


1.2S32 


60 


0.8187 


0.8439 


1.1926 


1.2539 



B. THE SIMULATIONS 
1. Point Estimation 

The purpose of this simulation is to investigate the performance of bootstrap 
point estimates. Cortes-Colon [Ref. 8] explored this subject for the sample mean of ex- 
ponential variates, using the mean squared error as the criterion for his evaluation. This 
paper in contrast approaches the problem by looking at the bias and the variance sepa- 
rately in order to isolate effects. 

The simulations in this section were conducted in SIMTBED [Ref. 9], a simu- 
lation software package for the IBM Personal Computer and compatibles, which uses a 
multiplicative congruential generator with multiplier 16807 and modulus 2 31 — 1 for the 
uniform and an acceptance-rejection scheme for the gamma variates. For the exper- 
iments in this section, ten super-replications were performed with differing numbers of 
trials for each original sample size. The original sample sizes used were n = 10, 20, 30, 
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40 and 50 with respective numbers of replications for one super-replication M = 480, 
240, 180, 120 and 96. With 10 super-replications, this sums up to a total of 4S00, 2400, 
1800, 1200 and 960 trials for each n and for each of the bootstrap replications. For 
validation purposes, similar simulations were performed on the author's personal com- 
puter using the APL language and also on the XPS mainframe using independent 
FORTRAN 77 programs. The results were similar to those obtained from SIMTBED. 

a. Bias of the Bootstrap Estimate 

In the first part of the simulation experiment, the quantity of interest is the 
bias B, the difference between the bootstrap estimate for the scale parameter / and its 

A 

true value (?. = 1). The bootstrap estimate ).* was obtained according to equation 2.6 

A 

and the bias B was computed as B = /* — 1 for each combination of n and N. 
Figure 2, created with GRAFSTAT [Ref. 10], shows the average values for B as a 
function of the number of bootstrap replications N\ for various values of n. Table 2 lists 
the lengths of the central 90% confidence intervals for B, which are based on the 
super-replications. 

The graph of the average values of B shows, that the number of bootstrap 
replications N has on the average almost no effect on the "closeness" of the bootstrap 
estimate to the actual value. Linear regression performed on the averages versus N re- 
sulted in slope parameters of the order of 10~ 3 and less. The bias is significantly affected 
by the sample size n. The reason for this behavior is the fact that the estimator is biased 
and that the bias decreases with increasing sample size. The average bias for each value 
of n is significantly larger than the amount expected from equation 3.22, which for this 
case would be l/(n-l). The observed average bias is approximately twice the expected 
value w’hich seems to indicate that the bootstrap method introduces additional bias. The 
variability of the bias as measured by the length of a 90% confidence interval is pre- 
sented in Table 2. These lengths decrease with increasing sample size n but are not af- 
fected by the number of bootstrap replications N. 
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Figure 2. Average Bias: Average values of bootstrap estimate minus actual value 

for the true values ). = 1. 
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Table 2. VARIABILITY OF BIAS: Length of the 90% confidence interval for the 

bias B. 

(Variances are less than 10~ 3 ) 



N n 


10 


20 


30 


40 


'50 


20 


1.3625 


0.8567 


0.6764 


0.5S04 


0.5049 


50 


1 .37 1 S 


O.S581 


0.7064 


0.5623 


0.5100 


100 


1.3137 


0.S275 


0.6829 


0.5518 


0.4991 


200 


1.3481 


0.8108 


0.6694 


0.5754 


0.4665 


300 


1.3825 


0.8476 


0.6588 


0.5692 


0.5218 


400 


1.3574 


0.8384 


0.6839 


0.5449 


0.4996 


500 


1.3583 


0.S512 


0.6855 


0.5742 


0.5204 



b. Bootstrap Variance Estimation 

The quantity of interest here is the bias of the bootstrap estimate of the 

A 

variance of /, i. e. a* J — a 2 . The bootstrap estimate of the variance, a * J , is computed 
according to equation 2.7 and a 2 is the theoretical value from equation 3.23. The average 
values of the bias of the bootstrap variance estimate are displayed graphically in Figure 
3 while the lengths of its 90% confidence intervals, depicting the variability, are listed in 
Table 3. 

The graph shows that on the average bootstrap overestimates the variance 
of the maximum likelihood estimator of the scale parameter of the exponential distrib- 
ution. The average bias after some fluctuation for low values of the bootstrap repli- 
cations X seems to stabilize and from then on the number of bootstrap replications does 
not have a significant effect. Again the major impact on the bias is given by the sample 
size n. The graph clearly shows the decrease in bias with increasing n. The variability 
of the bias of the bootstrap variance estimate, represented by the lengths of the 90% 
confidence interval of the bias also does not seem to change with the number of boot- 
strap replications X. Least squares regression of the lengths on the number of bootstrap 
replications yields slope parameters of the order of 10~ 5 , which does not indicate a strong 
dependence. So a choice of about X = 200 bootstrap replications should be appropriate. 
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Figure 3. Bias of Bootstrap Variance Estimate: Average values of the bias of the 

bootstrap variance estimate b — a 2 for ). = 1. 
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Table 3. BIAS OF BOOTSTRAP VARIANCE ESTIMATE: Length of the 90% 

confidence interval of the bias of the bootstrap variance estimate 
a * 2 — a 2 for ). = 1. 

(Variances are less than 10 3 ) 



N n 


10 


20 


30 


40 


50 


20 


0.698S 


0.1535 


0.0742 


0.0549 


0.0383 


50 


0.6757 


0.1568 


0.0793 


0.0456 


0.0334 


100 


0.6266 


0.1491 


0.0725 


0.0470 


0.0310 


200 


0.7026 


0.1452 


0.0706 


0.0436 


0.0324 


300 


0.6806 


0.1455 


0.0718 


0.0422 


0.0328 


400 


0.6742 


0.1469 


0.0730 


0.0450 


0.0302 


500 


0.6572 


0.1542 


0.0710 


0.0418 


0.0297 



The results of this section, briefly summarized, are: The number of boot- 
strap replications has no major impact on the "closeness" of the bootstrap estimates to 
the theoretical values. This observation is in agreement with the results by Cortes-Colon 
[Ref. 8] and by Efron and Tibshirani [Ref. 7]. 

2. Confidence Intervals 

This section investigates bootstrap confidence intervals obtained by the 
percentile and bias-corrected percentile methods. 

a. Simulation Validation 

Validation is an important part of every simulation. Checking the results for 
plausibility, comparing them with the theory and with results obtained by other authors 
are some of the ways to accomplish validation. The latter way was specially chosen for 
this part of the thesis. To ensure that the percentile method and the bias-corrected 
percentile method were properly understood and correctly implemented in computer 
code, Efron's simulation [Ref. 4, page 84] was repeated. In the experiment random 
samples of size n = 15 are drawn from the exponential distribution Exp [ ). = 1 ] . The 
sample are standardized to ensure that the sample mean x = 0 and the sample variance 
Xfo —x) 2 l(n— 1) = 1. The bootstrap method is then applied to the standardized sam- 
ples with the number of bootstrap replications N = 1000. Selected percentiles are ap- 
proximated using the percentile method and the bias-corrected percentile method. Table 
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4 shows the results for 10 trials. The averages of the estimated percentiles over the ten 
trials and the corresponding results obtained by Efron [Ref. 4, p. 85] are also presented. 
The numbers obtained are quite close to Efron's results. The simulation was pro- 
grammed in FORTRAN 77 and conducted on the NPS IBM mainframe. The random 
variates, exponential and uniform, were generated using the random number package 
LLRANDOMII [Ref. 11]. Appendix D shows the listing of the program for the 
percentile method and the bias-corrected percentile method. 



Table 4. SIMULATION VALIDATION: N’onparametric confidence intervals of 

exponential variates Exp[ / = 1], standardized, i. e. sample mean = 0 and 
sample variance = 1; n = 15, N = 1000. 



Trial 




Percentile Method 






Bias-corrected PM 




5% 


10% 


90% 


95% 


5% 


10% 


90% 


95% 


1 


-0.358 


-0.300 


0.368 


0.457 


-0.358 


-0.300 


0.368 


0.457 


2 


-0.403 


-0.322 


0.324 


0.425 


-0.384 


-0.293 


0.359 


0.454 


3 


-0.377 


-0.298 


0.305 


0.435 


-0.373 


-0.290 


0.328 


0.453 


4 


-0.375 


-0.309 


0.331 


0.433 


-0.373 


-0.304 


0.340 


0.440 


5 


-0.381 


-0.300 


0.329 


0.431 


-0.378 


-0.292 


0.343 


0.439 


6 


-0.408 


-0.302 


0.345 


0.451 


-0.391 


-0.2SS 


0.355 


0.463 


7 


-0.347 


-0.302 


0.330 


0.478 


-0.322 


-0.271 


0.399 


0.565 


8 


-0.391 


-0.304 


0.320 


0.426 


-0.356 


-0.289 


0.348 


0.449 


9 


-0.384 


-0.320 


0.309 


0.410 


-0.371 


-0.298 


0.336 


0.442 


10 


-0.425 


-0.332 


0.332 


0.404 


-0.401 


-0.305 


0.362 


0.436 


Average 


-0.385 


-0.309 


0.329 


0.435 


-0.371 


-0.293 


0.354 


0.460 


Efron 


-0.39 


-0.32 


0.33 


0.43 


-0.36 


-0.29 


0.36 


0.47 



b. Coverage 

The interpretation of a confidence interval, e. g. 90%, for a parameter of 
interest is, that in the long run with a relative frequency of 0.9, the computed confidence 
intervals cover the actual value of the parameter. Thus the relative frequency of cover- 
age can be used to assess the quality and applicability of a method, which produces 
confidence intervals. In this section, the coverage is investigated for the percentile 
method and the bias-corrected percentile method. 

The simulation looks at the central 90% confidence interval. This interval 
is set up using the 5th and 95th quantiles of the empirical bootstrap distribution for the 
scale parameter ). of the exponential distribution for both methods. The simulation was 
programmed in FORTRAN 77 and run on the NPS mainframe computer. Random 
numbers were generated with LLRANDOMII [Ref. 11]. For each combination of 
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sample size n and number of bootstrap replications X the simulation consists of 1000 
repetitions, for each of which the coverage of the actual value ). = 1 was checked. Table 
5 shows the counts for the percentile method and Table 6 for the bias-corrected 
percentile method. 



Table 5. COVERAGE-PERCENTILE METHOD CONFIDENCE 



INTERVAL: Coverage of the true value ). = 1 by the 90% confidence 

interval obtained from the percentile method, out of 1000 repetitions. 



N n 


10 


20 


30 


40 


50 


50 


S26 


845 


859 


883 


888 


100 


S04 


851 


877 


888 


889 


200 


794 


S45 


881 


853 


881 


300 


819 


856 


862 


870 


884 


500 


784 


848 


840 


876 


860 



Table 6. COVERAGE-BIAS-CORRECTED PERCENTILE METHOD CONFI- 



DENCE INTERVAL: Coverage of the true value ). = 1 by the 90% 

confidence interval obtained from the bias-corrected percentile method, 
out of 1000 repetitions. 



N n 


10 


20 


30 


40 


50 


50 


831 


831 


855 


875 


870 


100 


S15 


847 


SSO 


88S 


S88 


200 


792 


851 


881 


859 


873 


300 


821 


871 


857 


871 


8SS 


500 


793 


850 


847 


883 


860 



The coverage in all cases is below the nominal level of 90%. The coverage 
appears to be somewhat erratic for the smaller values of sample sizes, n = 10 and 20, 
but it seems to improve with increasing n. Schenker [Ref. 6] observed a similar behavior 
in his investigation dealing with the estimation of the variance of a normal distribution. 
The number of bootstrap replications X again seems not to have a significant effect. 
Significant differences between the percentile method and the bias-corrected percentile 
method are also not detectable in this experiment. 
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c. Percentiles 

The simulation in the previous section was set up to also provide the aver- 
age values of the 5th, 10th, 90th and 95th percentile of the empirical bootstrap distrib- 
ution. Table 7 lists these values for X = 500 bootstrap replications; these are averages 
of 1000 trials. Both methods, percentile and bias-corrected percentile method on the 
average overestimate the percentiles compared to the theoretical values from Table 1. 
The amount of overestimation is shown in the table in parentheses. This amount is in 
general larger for the percentile method than for the bias-corrected percentile method, 
which means that the correction, which the latter method applies, is working in the right 
direction. The difference between theoretical values and the bootstrap-based estimates 
decreases with increasing original sample size n. 



Table 7. AVERAGE PERCENTILES: Average values for percentiles obtained 

with the percentile and the bias-corrected percentile method in 1000 trials; 
/ = 1, number of bootstrap replications N = 500; numbers in parentheses 
are the amount of overestimation, compared to the theoretical values. 



n 




Percentile Method 






Bias-corrected PM 




5% 


10% 


90% 


95% 


5% 


10% 


90% 


95% 


10 


0.755 


0.817 


1.727 


1.979 


0.737 


0.795 


1.652 


1.890 


(0. 1 IS) 


(0.113) 


(0.129) 


(0.136) 


(0.100) 


(0.091) 


(0.045) 


(0.047) 


20 


0.773 


0.825 


1 .426 


1.560 


0.757 


0.808 


1.3S9 


1.517 


(0.056) 


(0.053) 


(0.049) 


(0.051) 


(0.040) 


(0.036) 


(0.012) 


(0.008) 


30 


0.799 


0.846 


1.327 


1.424 


0.7S5 


0.831 


1.300 


1.395 


(0.040) 


(0.039) 


(0.035) 


(0.035) 


(0.026) 


(0.024) 


(0.008) 


(0.006) 


40 


0.814 


0.857 


1.270 


1.350 


0.802 


0.S44 


1.250 


1.328 


(0.010) 


(0.013) 


(0.025) 


(0.025) 


(-0.002) 


(0.000) 


(0.005) 


(0.003) 


50 


0.833 


0.872 


1.240 


1.308 


0.824 


0.863 


1.225 


1.292 


(0.029) 


(0.028) 


(0.026) 


(0.025) 


(0.020) 


(0.019) 


(0.011) 


(0.009) 



The behavior of percentile estimates was investigated further. The simu- 
lation for this purpose was done in SIMTBED [Ref. 9] on the author's personal com- 
puter. The 5th and 95th percentiles were selected as representative objects for 
investigation. The number of trials is 1200 for n = 10, 600 for n = 20, 480 for n = 30, 
300 for n = 40 and 240 for n = 50. Appendix A lists the results for the percentile 
method. These results show that both the standard deviation of the percentile estimate 
and the width of the central 90% confidence interval decrease with increasing sample 
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size n. The number of bootstrap replications X seems not to affect the results. Least 
squares regression of the values on the number of bootstrap replications resulted in 
values for the slope of 10~ 4 and less. And tests for distributional fit in GRAFSTAT 
[Ref. 10] did not show significant differences for different numbers of bootstrap repli- 
cations. 

The simulation was repeated for the 5th and 95th percentiles using the 
bias-corrected percentile method. The results are listed in Appendix B. The conclusions 
for this method are basically the same as with the percentile method, decreasing standard 
deviation and width of the central 90% confidence interval with increasing sample size 
and no effect of the number of bootstrap replications. The only difference again is that 
the bias-corrected percentile method is on the average closer to the theoretical value 
than the percentile method. 



19 



IV. NORMAL LINEAR REGRESSION 



The results of simulations to study the properties of bootstrap estimators of the 
parameters in a simple linear regression model are presented in this chapter. 

A. THEORETICAL OVERVIEW 
1. The Regression Model 

Let {(Xj, - y- 1 ),(x 2 ,y’ 2 ),(x 3 ,j 3 ), ...,(.v„, j/„)} be n pairs of observations with x as the in- 
dependent variable and y the dependent variable. Under the assumptions of independ- 
ence, normal distribution and homoscedasticity for the random variables T„ the model 
for a linear relationship between x and T, is 

Yj = + Cj fori = 1, 2,3,..., n. (4.25) 

The random variables c, have mean 0 and variance o 2 and are normally distributed: 

£,• ~ A'(0, a 2 ) for i — 1 , 2, 3, ..., n. (4.26) 

It is well known that the maximum likelihood estimates for the coefficients /? 0 and are 



fix 



n 




i~ 1 



V 1 2 -2 

/ Xj — nx 

i= 1 



(4.27) 



and 

k = y - i\X ■ ( 4 - 28 ) 

Both estimators are unbiased, i. e. 

£[/? 0 ] = P 0 and £[?,] = /?,. (4.29) 

A A 

The joint sampling distribution for /? 0 and is known to be a bivariate normal distrib- 
ution. The marginal distributions are normal with means equal to the respective true 
values and the variances 
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Varlh 



and 



Var\}{] 



n 




U n 



Y (*/ - X ) 2 



2 

a 



Y to - x ? 

i=i 



The covariance between p 0 and /?, is 



Cov^oJ,] = - -- 

Y to- ” to 2 



(4.30) 



(4.31) 



(4.32) 



2. Bootstrap Method for Regression Models 

The implementation of the bootstrap method for regression models [Ref. 4] 
differs slightly from the one in the one-sample case. It is described here for the normal 
linear regression of one dependent and one independent variable, which is the topic of 
this chapter. 

A A 

To perform the bootstrap, first the least squares estimates p 0 and /?, (equations 
4.27 and 4.28) are computed. These estimates are used to compute the residuals e, : 

e i = y'i - do + Pi X;) for i = 1, 2, 3, ..., n. (4.33) 

A bootstrap sample e J ' of size n, which is of the same size as the original sample, is 
obtained by randomly drawing with replacement n times from the e,. Computing 

y*t = Po + P\X t +<?*,• for i = 1, 2, 3,..., n (4.34) 

results in n pairs of 'observations' {(•v 1 < >'*i),(x 2 , j^* 2 ),(-^ 3 , ^* 3 ), ...,(jc„, y*„)} . These n pairs 

A A 

of 'observations' are used to compute the bootstrap estimates /?*„ and /?*, using the 
equations 4.27 and 4.28. The process of randomly drawing and computing the estimates 
is repeated for a total of X bootstrap replications. The bootstrap estimates and 
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